Part 1: wrap up base R
Finish subsetting (matrices, factors and data.frames)
Subsetting and assignment
Part 2: modern R basics
Visualization in base R
Visualization in ggplot2 (intro)
April 14, 2020
Finish subsetting (matrices, factors and data.frames)
Subsetting and assignment
Visualization in base R
Visualization in ggplot2 (intro)
Click here and look for the folder with your name. There you will find the two files (Rmd and html) you are provide a peer review for.
You must be signed with your edu.pdx account to access the document.
Type your peer evaluation in word making reference to the line in the Rmd fileor to the specific homework problem number to which you are alluding to.
Again, be constructive and considerate. Use a continuous scale between 0 and 3, using the following as reference:
0 - No homework turned in.
1 - Turned in but low effort, poorly presented with nonfunctional code and ignoring directions.
2 - Decent effort, well presented all code works and followed directions with some minor issues.
3 - Nailed it!
ggplot2Since R is a software explicitly developed to do statistics R it comes with extensive plotting capabilities by default
MANY plotting functions are installed in the graphics package, which ships with base R
Look into the help files for the functions in this package using library(help = "graphics")
Here are a few examples of visualization functions in the graphics package:
plot, lines, points,abline,boxplot, pairs, matplot, barplot, curve, dotchart, pie, rasterImage, coplot, cdplot, mosaicplot, polygon
ggplot2plotplot is generic function for plotting R objects
Functions in R can be designed in such a way that the same function can have a completely different behavior depending on the object it is used with
Here is a list with some of the objects it can be applied to
methods(plot)
## [1] plot,ANY-method plot,color-method plot.acf* ## [4] plot.data.frame* plot.decomposed.ts* plot.default ## [7] plot.dendrogram* plot.density* plot.ecdf ## [10] plot.factor* plot.formula* plot.function ## [13] plot.ggplot* plot.gtable* plot.hcl_palettes* ## [16] plot.hclust* plot.histogram* plot.HoltWinters* ## [19] plot.isoreg* plot.lm* plot.medpolish* ## [22] plot.mlm* plot.ppr* plot.prcomp* ## [25] plot.princomp* plot.profile.nls* plot.R6* ## [28] plot.raster* plot.spec* plot.stepfun ## [31] plot.stl* plot.table* plot.trans* ## [34] plot.ts plot.tskernel* plot.TukeyHSD* ## see '?methods' for accessing help and source code
ggplot2plotThe default form of the function generates scatterplots
Its general structure is plot(x, y, ...), where
x is the variable in the x axis,
y is the variable in the y axis, and
... represents other graphical parameters (see ?par for an extensive list)
Let’s do an example to see some of the options
ggplot2cars datasetLet’s load the built-in data cars, which loads as a dataframe, a type of object mentioned earlier. Then, we can look at it in a couple different ways.
data(cars) loads this dataframe into the Global Environment as a promise. Promises are unevaluated arguments.
str(cars, 5)
## 'data.frame': 50 obs. of 2 variables: ## $ speed: num 4 4 7 7 8 9 10 10 10 11 ... ## $ dist : num 2 10 4 22 16 10 18 26 34 17 ...
ggplot2cars datasethead(cars,4) # prints first 4 rows
## speed dist ## 1 4 2 ## 2 4 10 ## 3 7 4 ## 4 7 22
summary(cars) # summary stats for each var
## speed dist ## Min. : 4.0 Min. : 2.00 ## 1st Qu.:12.0 1st Qu.: 26.00 ## Median :15.0 Median : 36.00 ## Mean :15.4 Mean : 42.98 ## 3rd Qu.:19.0 3rd Qu.: 56.00 ## Max. :25.0 Max. :120.00
ggplot2plot optionsplot(x = cars$speed, y = cars$dist,
xlab = "Speed (mph)",
ylab = "Stopping distance (ft)",
main = "Speeds and stopping distances of cars",
type = "p",
lty = 1, lwd = 1,
pch = 16,
cex = 1, cex.axis = 1, cex.lab = 1,
col = "firebrick")
ggplot2plot optionsggplot2plot options: typeggplot2plot options: cexobj.size <- seq(0.5,5,length.out = length(cars$speed))
plot(x = cars$speed, y = cars$dist,
xlab = "Speed (mph)",
ylab = "Stopping distance (ft)",
main = "Speeds and stopping distances of cars",
type = "p",
lty = 1, lwd = 1,
pch = 16,
cex = obj.size, cex.axis = 1, cex.lab = 1,
col = "firebrick")
ggplot2plot options: cexggplot2plot options: colcol.vec <- rep(c("firebrick","forestgreen","cornflowerblue"),
times=c(sum(cars$speed<10),
sum(cars$speed>=10&cars$speed<17),
sum(cars$speed>=17)))
plot(x = cars$speed, y = cars$dist,
xlab = "Speed (mph)",
ylab = "Stopping distance (ft)",
main = "Speeds and stopping distances of cars",
type = "p",
lty = 1, lwd = 1,
pch = 16,
cex = 1, cex.axis = 1, cex.lab = 1,
col = col.vec)
ggplot2plot options: colggplot2legendplot(x = cars$speed, y = cars$dist,
xlab = "Speed (mph)",
ylab = "Stopping distance (ft)",
main = "Speeds and stopping distances of cars",
type = "p",
lty = 1, lwd = 1,
pch = 16,
cex = 1, cex.axis = 1, cex.lab = 1,
col = col.vec)
legend("topleft",
bty = "n",
pch=c(16,16,16),
col=c("firebrick","forestgreen","cornflowerblue"),
legend=c("speed<10","10<=speed<17","speed>=17"))
ggplot2legendYour task is to play with the swiss data set built into R fr 20 mins
Use ?swiss to see what things mean in the dataset
Go to the in-class exercise Rmd document you started working on Tuesday
Load the data using data(swiss)
Think of and write down in your Rmd document one or two questions you’d like to explore with these data
Use the function plot to explore your questions and make 2 or 3 nicely formatted plots with with the options we discussed so far (include legends, play with col, cex, type)
ggplot2boxplotpar(mfrow=c(1,2),mai=c(1,0.5,0.1,0.1))
boxplot(decrease ~ treatment, data = OrchardSprays, col = "cornflowerblue",
log = "y",cex.axis=0.7,cex.lab=0.7,notch=F)
## horizontal=TRUE, switching y <--> x :
boxplot(decrease ~ treatment, data = OrchardSprays, col = "cornflowerblue",
log = "x", horizontal=TRUE,cex.axis=0.7,cex.lab=0.7,notch=F)
ggplot2curvepar(mfrow=c(1,3),mai=c(0.9,0.4,0.1,0.1))
curve(expr=sin, from=-2*pi, to=2*pi, xname = "t",cex.axis=0.7, cex.lab=0.7)
curve(expr=tan, xname = "t", from=-2*pi, to=2*pi, cex.axis=0.7, cex.lab=0.7)
myfn <- function(t){tan(t)*sin(t)}
curve(expr=myfn, xname = "t", from=-2*pi, to=2*pi, cex.axis=0.7, cex.lab=0.7)
ggplot2histpar(mfrow=c(1,2),mai=c(1,0.5,0.1,0.1)) x <- rchisq(1000, df = 4) hist(x, freq = FALSE, ylim = c(0, 0.2),col="orange",main="") #hist and curve combined hist(x, freq = FALSE, ylim = c(0, 0.2),col="orange",main="") lines(density(x,from=0, to=20), col = "blue3", lty = 1, lwd = 3, add = TRUE)
## Warning in plot.xy(xy.coords(x, y), type = type, ...): "add" is not a graphical ## parameter
ggplot2pairspar(mai=c(0.1,0.1,0.1,0.1))
pairs(iris[1:3], cex=0.5, cex.labels = 1,cex.axis=0.7,
pch = 21, bg = c("red", "green3", "blue")[unclass(iris$Species)])
ggplot2dotchartVADeaths
## Rural Male Rural Female Urban Male Urban Female ## 50-54 11.7 8.7 15.4 8.4 ## 55-59 18.1 11.7 24.3 13.6 ## 60-64 26.9 20.3 37.0 19.3 ## 65-69 41.0 30.9 54.6 35.1 ## 70-74 66.0 54.3 71.1 50.0
par(mai=c(0.4,0.1,0.4,0.1))
dotchart(VADeaths, bg = "skyblue",
cex=0.7, cex.axis=0.1,
main = "Death Rate VA - 1940")
ggplot2matplotpar(mfrow=c(1,2),mai=c(0.4,0.4,0.1,0.1))
sines <- outer(1:20, 1:4, function(x, y) sin(x / 20 * pi * y))
matplot(sines, pch = 1:4, type = "o", col = rainbow(ncol(sines)),
cex=0.5, cex.axis=0.7)
matplot(sines, type = "b", pch = 21:23, col = 2:5, bg = 2:5,
cex=0.5, cex.axis=0.7, main = "")
ggplot2barplotpar(mfrow=c(1,2),mai=c(0.9,0.4,0.1,0.1))
barplot(GNP ~ Year, data = longley, cex=0.5, cex.axis=0.7, cex.lab=0.7)
barplot(cbind(Employed, Unemployed) ~ Year, data = longley, cex = 0.5,
cex.axis=0.7,cex.lab=0.7)
ggplot2mosaicpar(mfrow=c(1,1),mai=c(0.9,0.4,1,0.4)) mosaicplot(~ Sex + Age + Survived, data = Titanic, main="",color = TRUE)
ggplot2mfrow (or mfcol)par(mfrow=c(1,3))
plot(x = cars$speed, y = cars$dist,xlab = "", ylab = "", main = "",
type = "p", pch = 16, col = "firebrick")
plot(x = cars$speed, y = cars$dist,xlab = "", ylab = "", main = "",
type = "p", pch = 16, col = "firebrick")
abline(reg=lm(dist~speed,data=cars),col="forestgreen")
plot(x = cars$speed, y = cars$dist,xlab = "", ylab = "", main = "",
type = "p", pch = 16, col = "firebrick")
lines(lowess(cars),col="cornflowerblue")
ggplot2mfrow (or mfcol)ggplot2layout functionlayout(mat,
widths = rep.int(1, ncol(mat)),
heights = rep.int(1, nrow(mat)),
respect = FALSE)
ggplot2layout functionnf <- layout(matrix(c(2,0,1,3),2,2,byrow = TRUE), widths=c(3,1), heights=c(1,3), TRUE) layout.show(nf)
ggplot2layout functionExtend your exploration of swiss by using 2 or three of the figure types discussed after plot
Make a figure with two panels (1 row by two columns) using mfrow
Make a figure with 3 different plots using layout